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Abstract 

A variation of the Domain Wall operator with an additional param¬ 
eter a will be introduced. The conditioning of the new Domain Wall 
operator depends on a, whereas the corresponding 4D propagator does 
not. The new and the conventional Domain Wall operator agree for 
a = 1. By tuning a, speed ups of the linear system solvers of around 
20% could be achieved. 


1 Introduction 

A variation of the Domain Wall operator is suggested here. It introduces a 
parameter a that appears only as a global factor in the 4D matrix elements. 
Therefore, this generalization is simple in structure and the Domain Wall 
formalism and the reduction to the 4D Overlap formalism can be used almost 
unchanged. Details about the Domain Wall and the Overlap formalism and 
how they can be translated into each other can be found here [D El El m El 
EiEiEiEiiiniiiiiiiiiiniiiaiisiiisiiiT]. As a reference for notation and for 
the sake of completeness the standard 5D to 4D reduction will be rederived 
in appendix and [HI 
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2 The better conditioned Domain Wall operator 

The new Domain Wall operator introduces an additional parameter a, 


Di_|_(P_ + aPf) aZ)i_P_ 
aD2-P+ aD2+ 

0 aDs-Pf 

-mDi^-P- 0 

with 


Da{m) = 
0 

aD2-P- 

aD3+ 


aDL,-P+ 


—mPi-P- 
0 


( 1 ) 


Dls+{P+ + otP-) 


Di+ — biDw + 1, — CjPui “ 1) (2) 

P+ = i(l + 75), P- = i(l-75). (3) 


Dw denotes the Wilson Dirac matrix 

DwiMb) = (4 + M3)6x^y —— [(1 —7^)P^(a:)(53;-|_^^j, + (l + 7^)P^(y)5a;^jy+^]. (4) 
Multiplying eq.([T|) from the right with P, (see ea. IfTTjl l. leads to 


DaP = Da 


= 75 


= plp 


P_ P+ 0 
0 P_ P+ 
0 0 P_ 


P+ 


0 


Qi-C- aQi+ 0 
0 OiQ2- OiQ2+ 
0 0 aQs- 


Qp+c+ 

1 
0 
0 


0 


0 

a 

0 


0 

0 

a 


0 0 


0 


0 

0 

0 

a 


0 
0 

P_ _ 

0 

0 

aQLs- 


= DiPA 


To find the 4D propagator, eq. flHHI) has to be solved, 

Pi(m)Py = Pi(l)P6, 


(5) 


( 6 ) 


(7) 


( 8 ) 
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with source b and 4D propagator yi. The independence of the 4D propagator 
from a follows directly, 

Di{l)Pb = Di{m)Py = Da{m)PA~^y = Da{m)Pz , (9) 

with Az = y and therefore zi = yi- 

3 Results 

In this section, the a dependence of the conditioning of will be presented. 
The computations were done on 3 MILC gauge fields of size 16^ x 32, down¬ 
loaded at NERSC. The conjugate gradient method on the normal equation 
was used to solve eq.Q. 

The red black preconditioned version of Da was used in the form, 

Dbb = hb — Ibh DbrIrrDrb ■ (10) 

This version of red black preconditioning allows for an efficient use of the 
Zolotarev approximation to the sign function. This is contrary to what has 
been said in where we used the matrix, 

Dbb — Ibb Dbrl-rr ^rb j (H) 

instead. This is due to the fact that the rows of eq. ffTTT) with large Zolotarev 
coefficients cause the convergence to slow down. This behaviour can be 
improved by scaling all rows that contain a Zolotarev coefficient larger than 
one with a factor equal to the inverse of the Zolotarev coefficient. This can 
be seen as a preconditioning from the left. But the even better method is to 
take ea. (|10l) where the preconditioning from the left cancels out and where 
the weighting of the rows is done automatically. 

The same behaviour can be observed for Mobius coefficients bi and Cj 
larger than one. 

Let ni(a) be the number of iterations for the residual to be of the order 
of 0(—8), where i runs over the color and Dirac source indices and over 
the three gauge fields. The graphs in this section show the relative count 
ni{a)/ni{a = 1), together with the standard deviation, for a series of a 
values. 

For the remaining parameters, the quark mass, the 5*^ dimension Lg and 
the Mobius coefficients bi and Ci, the optimal alpha values and speed ups 
are summarised in the following table. 
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b=1.0, c=1.0, m=0.06, U=4 
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Figure 1: Relative iteration count ni{a)/ni{a = 1) for 3 gauge fields of size 
103 X 32. 


Mass 

Ls 

bi^ Ci 

Best a 

Speed Up 

0.06 

4 

1,1 

0.55 

25% 

0.06 

6 

1,1 

0.55 

24% 

0.06 

8 

1,1 

0.55 

22% 

0.06 

10 

1,1 

0.6 

19% 

0.06 

12 

1,1 

0.6 

17% 

0.01 

8 

1,1 

0.55 

23% 

0.06 

8 

1.7, 0.7 

0.6 

20% 

0.06 

10 

Zolotarev 

0.4 

17% 


Acknowledgment: I thank Richard Brower and Kostas Orginos for dis¬ 
cussions and Tony Kennedy for discussions and the code to compute Zolotarev 
coefficients. 
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b=1.0, c=1.0, m=0.06, L^=6 
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Figure 2: Relative iteration count ni{a)/ni{a = 1) for 3 gauge fields of size 
16^ X 32. 


b=1.0, c=1.0, m=0.06, U=8 
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Figure 3: Relative iteration count ni{a)/ni{a = 1) for 3 gauge fields of size 
16^ X 32. 
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b= 1 . 0 , c= 1 . 0 , m=0.06, Ls =10 



Figure 4: Relative iteration count ni{a)/ni{a = 1) for 3 gauge fields of size 
16^ X 32. 


b= 1 . 0 , c= 1 . 0 , m= 0 . 06 , 13=12 



Figure 5: Relative iteration count ni{a)/ni{a = 1) for 3 gauge fields of size 
16^ X 32. 
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b= 1 . 0 , c= 1 . 0 , m= 0 . 01 , L 3=8 
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Figure 6: Relative iteration count ni{a)/ni{a = 1) for 3 gauge fields of size 
16^ X 32. 


b=1.7, c=0.7, m=0.06, L^=8 



Figure 7: Relative iteration count ni{a)/ni{a = 1) for 3 gauge fields of size 
16^ X 32. 
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b=1.0, c=1.0, m=0.06, Lg=10, Zolotarev 



Figure 8: Relative iteration count ni{a)/ni{a = 1) for 3 gauge fields of size 
16^ X 32, with bi = Ci. 
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A Domain Wall to Overlap transformation 

To keep notation simple, we perform the transformation with L* = 4 sites 
in the 5th dimension. A generalisation to arbitrary Lg is straightforward. 
The Domain Wall to Overlap transformation reads, 

LDDw{'m)R = FDQy{m). ( 12 ) 


The transformation matrices take the form (for Lg sites in the 5th dimen¬ 
sion). 


F = LD£)w{^)R, 


(13) 


and 


L = L 1 L 2 = 


R = PR^ = 


■ 1 

Si 

S1S2 

SiS2S^ ' 


Qr- 0 

0 

0 

0 

1 

S2 

S2S3 


0 

0 

0 

0 

0 

1 


53 


0 0 

Qa’- 

0 

0 

0 

0 


1 


0 0 

0 

Q4- 


■ p_ 

p+ 

0 

0 


-1 

0 

0 0 ■ 


0 

p_ 

p+ 

0 


-S2S3S4C+ 

1 

0 0 


0 

0 

p_ 

p+ 


-S3SiC+ 

0 

1 0 


. p+ 

0 

0 

p_ 


-S4C+ 

0 

0 1 


75, 




DQv{m) 000 
0 10 0 

0 0 10 

0 0 0 1 


The matrix entries are defined as follows, 


(14) 


Qi+ = 75D^(6iP+ -k CiP-) + 1, Qi- = 'y5D^{biP- -k CiP+) - 1, 

Si = Tr^ = -Q-^Q^+, 

c+ = P+ — mP-, c- = P- — mP+. (15) 

is called the transfer matrix. 

The matrix multiplications will be performed in the following order. 


L\L 2 D£)yi/{rn)PR\ = L1L2M1R1 = L1M2R1 = ( 16 ) 
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Step 1: 


Ml = DDw{'m)P = 75 


Qi-C- Qi+ 0 0 

0 Q 2 - Q 2 + 0 

0 0 Q 3 - Q 3 + 

(54+C+ 0 0 Q 4 - 


with 


Qi- — lb{Di+P- + Di-P+) 

= lb{Dw{biP- + Cj-f+) + P- — P+) 

= 'y5Dw{biP- + CiP+) - 1, 

Qi+ = 15(^1+P+ + Di-P-) 

= lb{Dw{biP+ + CiP-) + P+ — P-) 
= lbPw{biP+ + CiP-) + 1. 


Step 2: 

M 2 = ij2-Mi 

Step 3: 


c_ -Si 0 0 

0 1 -52 0 

0 0 1 -53 

-5'4C+ 0 0 1 


Mu = M 2 R 1 


—c_ + 5 iS' 2 S' 354 C+ —Si 0 0 

0 1 -5'2 0 

0 0 1 -^3 

0 0 0 1 


Step 4: 


LD £)\^ {rn)R = M 4 = L1M3 = 


—c_ + S'iS'2S'354C+ 0 0 0 

0 10 0 

0 0 10 

0 0 0 1 


This leads to, 


F = LDdw{1)R = 


(1 + 515253 ^ 4)75 0 0 0 

0 10 0 

0 0 10 

0 0 0 1 


(17) 


(18) 


(19) 


( 20 ) 


( 21 ) 


( 22 ) 


(23) 
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To make notation simpler, we define S = S 1 S 2 S 3 S 4 . The 5D Overlap Oper¬ 
ator takes the form, 


D^ovim) = F-^M^ = 


75 ( 1 - 1 - 5 ) ^(-c_-|-5c+) 0 0 0 


1 0 0 
0 10 
0 0 1 


(24) 


It follows for the (11) element. 


^ (m-h m 75 - 1-t- 75 -h 5(1 -^75 - m-F m 75 )) 


Hence eq. 


-Dov("l) — 


= + ^((l-Fm)(5-hl)75 + (l-m-)('S’-l)) 

= l((l + m) + (l-,„h5i|^ 

takes the form, 


I (^{1 + m) + {1 - 0 0 0 

0 10 0 

0 0 10 

0 0 0 1 


(25) 


(26) 


The matrix that acts as the variable for the polar decomposition can be 
found by setting. 


(5-1) (1-1/5) nti(l + a^X,)-nti(l-a^Xi) 
(5 + 1) (1 + 1/5) nti(l + aiXi)+nti(l-a*Xi)’ 

and therefore 


1 


1 


_ __ (1 — aiXi(l — a2X2)(l — a3X3)(l — 04X4) 

5 5i 525354 (1 + aiXi)(l + a2^2)(l + 03^3)(1 + 04 -^ 4 ) 

For each i, we determine Xj, 

5 -^ = {1 - aiX^){l + a^Xi)-^ 

Qi-Qi+ = {p-iXi + l)(ajXj — 1) ^ 

Qi+iS^iXj^ 1) — Qi—ip^iXi + 1) 

Qi — ')X'i Qi-\- + Qi — 

®iT 5 ((^i + 2 ) 75 X 4 ( 6 j + 


(27) 


(28) 


(29) 
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This results in, 


1 


aiXi = {bi + Ci)'y^D^ 


2 + (6i — Ci)D 


We can therefore write, 




Dovim) 000 


0 

0 

0 


1 0 0 
0 1 0 
0 0 1 



/ xl \ 


/ 6i \ 


X2 


b2 


X3 


53 


V X4 y 


V 54 / 


B Computation of the 4D propagator 

It follows directly from, 

D^y 0 0 0 

0 10 0 

0 0 10 

^ 0 0 0 1 

or 

DqvXi = bi, 

that the 4D propagator is equal to xi. We use ea. dT^ and find 

F~^ LD]:)y/{m)Rx = 6, 

or 

R{^P~^D]^\^{l)DDw{m)PRix = b. 

It follows from 


RiDqvRi — 


n4 

^ov 


0 0 0 

S2S^SiC+{D%y - 1) 1 0 0 

S^S^c+{D%y - 1) 0 1 0 

S^C+i^DQy — 1 ) 0 0 1 


y = b, 


(30) 


(31) 


(32) 

(33) 

(34) 

(35) 

(36) 


that yi = xi, i.e. that the 4D propagator is not affected by the similarity 
transformation with Ri. Hence we can use 


or 


DDw{m)Py = DDwi^)Pb, 
to determine the 4D propagator yi. 


(37) 

(38) 
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